full sentence
Notes on Applicability of GPT-4 to Document Understanding
We perform a missing, reproducible evaluation of all publicly available GPT-4 family models concerning the Document Understanding field, where it is frequently required to comprehend text spacial arrangement and visual clues in addition to textual semantics. Benchmark results indicate that though it is hard to achieve satisfactory results with text-only models, GPT-4 Vision Turbo performs well when one provides both text recognized by an external OCR engine and document images on the input. Evaluation is followed by analyses that suggest possible contamination of textual GPT-4 models and indicate the significant performance drop for lengthy documents.
MacBehaviour: An R package for behavioural experimentation on large language models
Duan, Xufeng, Li, Shixuan, Cai1, Zhenguang G.
There has been increasing interest in investigating the behaviours of large language models (LLMs) and LLM-powered chatbots by treating an LLM as a participant in a psychological experiment. We therefore developed an R package called "MacBehaviour" that aims to interact with more than 60 language models in one package (e.g., OpenAI's GPT family, the Claude family, Gemini, Llama family, and open-source models) and streamline the experimental process of LLMs behaviour experiments. The package offers a comprehensive set of functions designed for LLM experiments, covering experiment design, stimuli presentation, model behaviour manipulation, logging response and token probability. To demonstrate the utility and effectiveness of "MacBehaviour," we conducted three validation experiments on three LLMs (GPT-3.5, Llama-2 7B, and Vicuna-1.5 13B) to replicate sound-gender association in LLMs. The results consistently showed that they exhibit human-like tendencies to infer gender from novel personal names based on their phonology, as previously demonstrated (Cai et al., 2023). In summary, "MacBehaviour" is an R package for machine behaviour studies which offers a user-friendly interface and comprehensive features to simplify and standardize the experimental process.
A Side-by-side Comparison of Transformers for English Implicit Discourse Relation Classification
Lee, Bruce W., Yang, BongSeok, Lee, Jason Hyung-Jong
Though discourse parsing can help multiple NLP fields, there has been no wide language model search done on implicit discourse relation classification. This hinders researchers from fully utilizing public-available models in discourse analysis. This work is a straightforward, fine-tuned discourse performance comparison of seven pre-trained language models. We use PDTB-3, a popular discourse relation annotated dataset. Through our model search, we raise SOTA to 0.671 ACC and obtain novel observations. Some are contrary to what has been reported before (Shi and Demberg, 2019b), that sentence-level pre-training objectives (NSP, SBO, SOP) generally fail to produce the best performing model for implicit discourse relation classification. Counterintuitively, similar-sized PLMs with MLM and full attention led to better performance.
Looking at how people text โ Assist Blog
In this nascent automated messaging space, we think that one sentence booking is the holy grail of efficiency, spending hours and hours trying to understand the sentence structure and identify values which will make your query successful. Leveraging natural language processing (N.L.P.) to be able to identify the city, the check-in and check-out date is what everyone working on. While I believe that understanding a full sentence lays the foundation of understanding, the way people interact with a bot can vary from one full sentence to groups of snippets, rectifying typos or even changing their mind. The way most bots are built is to process every input to trigger another question or action within the same experience. If you are a conversational developer, you are probably familiar with how a message is received from the user, processed and sent back.
The Future of Search
Peter Norvig, Google's director of research, is an expert ace at building machines that answer tough questions. An authority in programming languages and artificial intelligence, he has written an oft-cited book on AI (Artificial Intelligence: A Modern Approach), has taught at the University of California, Berkeley, and the University of Southern California, and was the head of computational sciences at NASA. In 2001, Norvig came to Google to be the director of search quality. Four years later, he became Google's director of research, overseeing about 100 researchers who investigate topics that range from networking to machine translation. Technology Review spoke with Norvig to get a hint of what we can expect from search technology in the years to come.